Data collected as part of “Decisions about chocolate are processed differently than decisions on gambles: Evidence from eye-tracking” by Betty E. Kim-Viechnicki.

Exploring Conjoint Analysis Data

This dataset contains the results of an experiment in which participants were asked to choose one of three chocolates that were displayed on a screen. The information that was provided was the brand, type of chocolate, and the price. Below is an example of the choice that was presented.

Caption for the picture.

Each trial is a set of three choices given to a participant. Each choice consists of three attributes including brand, type, and price. Additional variables include the number of times the respondent fixated on any one of the attributes and whether the product was chosen.

Load Libraries and Data

We start by loading the dataset and necessary libraries.

library(plotly)

data <- as.data.frame(read.csv("http://goo.gl/GC1gRs"))

Effects coding was used to distinguish between levels. The following table shows the coding for five brands using four variables. Although Hershey is not a variable in the dataset, it is accounted for by setting the four brand variables to -1.

Brand B_Dove B_Lindt B_Godiva B_Ghirardelli
Dove 1 0 0 0
Lindt 0 1 0 0
Godiva 0 0 1 0
Ghirardelli 0 0 0 1
Hershey -1 -1 -1 -1

The same method is used to code the type of chocolate.

Brand T_MilkNuts T_Dark T_DarkNuts T_White
MilkNuts 1 0 0 0
Dark 0 1 0 0
DarkNuts 0 0 1 0
White 0 0 0 1
Milk -1 -1 -1 -1

For convenience, we create two factor variables “brand” and “type” that show the individual brand and types. By default, R will automatically code the variable levels.

data$brand <- as.factor(ifelse(data$B_Dove==1,"Dove",
                     ifelse(data$B_Lindt==1,"Lindt",
                            ifelse(data$B_Godiva==1,"Godiva",
                                   ifelse(data$B_Ghirardelli==1,"Ghirardelli","Hershey")))))
data$type <- as.factor(ifelse(data$T_MilkNuts==1,"MilkNuts",
                    ifelse(data$T_Dark==1,"Dark",
                           ifelse(data$T_DarkNuts==1,"DarkNuts",
                                  ifelse(data$T_White==1,"White","Milk")))))

This chunk performs basic counts on the dataset.

nrow(data)                                        # 1050 observations
length(unique(data$Ind))                          # 14 Individuals (Ind) participated
aggregate(data$Trial,list(indi=data$Ind),length)  # 75 each number of trials each Ind participated in

Exploring the Data

We start by exploring the dataset using the summary() function.

summary(data)
##       Ind           Trial         Alt        B_Dove         
##  Min.   :2401   Min.   : 1   Min.   :1   Min.   :-1.000000  
##  1st Qu.:2405   1st Qu.: 7   1st Qu.:1   1st Qu.: 0.000000  
##  Median :2410   Median :13   Median :2   Median : 0.000000  
##  Mean   :2409   Mean   :13   Mean   :2   Mean   : 0.006667  
##  3rd Qu.:2413   3rd Qu.:19   3rd Qu.:3   3rd Qu.: 0.000000  
##  Max.   :2417   Max.   :25   Max.   :3   Max.   : 1.000000  
##     B_Lindt              B_Godiva        B_Ghirardelli  
##  Min.   :-1.0000000   Min.   :-1.00000   Min.   :-1.00  
##  1st Qu.: 0.0000000   1st Qu.: 0.00000   1st Qu.: 0.00  
##  Median : 0.0000000   Median : 0.00000   Median : 0.00  
##  Mean   : 0.0009524   Mean   : 0.02667   Mean   :-0.02  
##  3rd Qu.: 0.0000000   3rd Qu.: 0.00000   3rd Qu.: 0.00  
##  Max.   : 1.0000000   Max.   : 1.00000   Max.   : 1.00  
##    T_MilkNuts           T_Dark           T_DarkNuts         T_White       
##  Min.   :-1.00000   Min.   :-1.00000   Min.   :-1.0000   Min.   :-1.0000  
##  1st Qu.: 0.00000   1st Qu.: 0.00000   1st Qu.: 0.0000   1st Qu.: 0.0000  
##  Median : 0.00000   Median : 0.00000   Median : 0.0000   Median : 0.0000  
##  Mean   : 0.01429   Mean   : 0.04095   Mean   : 0.0219   Mean   : 0.0181  
##  3rd Qu.: 0.00000   3rd Qu.: 0.00000   3rd Qu.: 0.0000   3rd Qu.: 0.0000  
##  Max.   : 1.00000   Max.   : 1.00000   Max.   : 1.0000   Max.   : 1.0000  
##      Price         Brand_Fix         Type_Fix        Price_Fix     
##  Min.   :0.500   Min.   : 0.000   Min.   : 0.000   Min.   :0.0000  
##  1st Qu.:1.300   1st Qu.: 1.000   1st Qu.: 2.000   1st Qu.:0.0000  
##  Median :2.200   Median : 2.000   Median : 3.000   Median :1.0000  
##  Mean   :2.211   Mean   : 2.083   Mean   : 3.267   Mean   :0.9838  
##  3rd Qu.:3.175   3rd Qu.: 3.000   3rd Qu.: 4.000   3rd Qu.:2.0000  
##  Max.   :4.000   Max.   :18.000   Max.   :25.000   Max.   :7.0000  
##      Chosen               brand           type    
##  Min.   :0.0000   Dove       :214   Dark    :233  
##  1st Qu.:0.0000   Ghirardelli:186   DarkNuts:213  
##  Median :0.0000   Godiva     :235   Milk    :190  
##  Mean   :0.3333   Hershey    :207   MilkNuts:205  
##  3rd Qu.:1.0000   Lindt      :208   White   :209  
##  Max.   :1.0000

The summary shows that there are no missing values and variables that should be either -1,0,or 1, are indeed so. The summary for brand and type indicate that the number of times the brands and types came up are approximately equal.

The following graphs visualize the variables and how many times each attribute was chosen. As the code for these graphs is largely the same, chunks are hidden unless there is a significant change.

Chocolate Attributes

cBrand <- data.frame(xtabs(Chosen ~ brand , data=data))

f <- list(
  family = "Arial, sans",
  size = 18,
  color = "#7f7f7f"
)
x <- list(
  title = "Brand",
  titlefont = f
)
y <- list(
  title = "Number Chosen",
  titlefont = f
)
plot_ly(
  x = cBrand$brand
  , y = cBrand$Freq
  , type = "bar"
  , filename="r-docs/knitr-example"
) %>%
layout(title = "Chosen by Brand" , xaxis = x, yaxis = y)
## Warning in plot_ly(x = cBrand$brand, y = cBrand$Freq, type = "bar",
## filename = "r-docs/knitr-example"): Ignoring filename. Use plotly_POST()
## if you want to post figures to plotly.

So what?

Chosen by brand shows that Godiva gets chosen more frequently than other brands and Hershey is the least chosen. However,it may be premature to conclude that Godiva is the best brand. Without understanding the other attributes that were available in the trial, we may be mistaking correlation for causation.

So what?

Chosen by type reveals that people do not like white chocolate. Out of the 350 chocolates that were chosen throughout the entire experiment, white chocolate was only picked 26 times. The next least picked chocolate type is milk chocolate with nuts with 68 being picked. Although we can see the stark difference in white chocolate, it seems that the other four types of chocolates are chosen at roughly the same frequency.

So what?

Chosen by price shows that as price increases the number of times the chocolate gets chosen decreases.The prices that were chosen for the experiment range from $0.50 and $4. Although these prices may be typical for chocolate, it should be investigated whether this range is appropriate for a conjoint analysis. One assumption that is made for under a conjoint analysis is that the attributes levels are linear with respect to the response variable. This assumption is further tested in the analysis section.

The following visualization shows a histogram of prices for both alternatives that were chosen and not chosen. The difference in histogram clearly shows that less expensive chocolates (<$2) are chosen more frequently than more expensive chocolate. FOr chocolates that are greater than $2, there is less seperation which indicates price has a different decision impact on more expensive chocolate than on less expensive chocolate.

chosenPrices <- data$Price[which(data$Chosen==1)]
notChosenPrices <- data$Price[which(data$Chosen==0)]
x <- list(
  title = "Pricing Bins",
  titlefont = f
)
y <- list(
  title = "Number Chosen",
  titlefont = f
)
plot_ly(x=chosenPrices , opacity = 0.6 , type = "histogram" , name="Chosen") %>%
  add_trace(x=notChosenPrices , name="Not Chosen") %>%
  layout(barmode="overlay",title="Chosen and Not Chosen Price Histograms")

Eye Tracking

An aggregated and interactive visualization of fixations over the attributes is available here: (https://pawelb.shinyapps.io/chocolate_slider/)

cBrandFix <- data.frame(table(data$Brand_Fix,data$Chosen))
cTypeFix <- data.frame(table(data$Type_Fix,data$Chosen))
cPriceFix <- data.frame(table(data$Price_Fix,data$Chosen))


x <- list(
  title = "Brand_Fix",
  titlefont = f
)
y <- list(
  title = "Number Chosen",
  titlefont = f
)

plot_ly(
  x = cBrandFix$Var1
  , y = cBrandFix$Freq[which(cBrandFix$Var2==1)]
  , opacity = 0.6
  , type = "bar"
  , name = "Brand"
  ) %>%
add_trace(x=cTypeFix$Var1
          ,y=cTypeFix$Freq[which(cTypeFix$Var2==1)]
          ,name="Type"
          ) %>%
add_trace(x=cPriceFix$Var1
          ,y=cPriceFix$Freq[which(cPriceFix$Var2==1)]
          ,name="Price"
          )  %>%
layout(barmode="stack" , title = "Chosen by Price_Fix" , xaxis = x, yaxis = y)

So what?

This graphic visualizes the relation between the attribute and the number of times the individual look at each. The eye tracking data indicates how many times the individual looked at each attribute before making a decision. The biggest difference between the attributes is the range of values on the x-axis. Price ranges from 0 to 6 whereas brand and type are looked at more often with range between 0 and 25.

The following visual overlays two graphs that show how many chocolates were chosen and not chosen at given prices.

cBT <- xtabs(Chosen ~ brand + type , data=data)

chosenPrices <- data$Price[which(data$Chosen==1)]
notChosenPrices <- data$Price[which(data$Chosen==0)]

plot_ly(x=chosenPrices , opacity = 0.6 , type = "histogram" , name="Chosen") %>%
  add_trace(x=notChosenPrices , name="Not Chosen") %>%
  layout(barmode="overlay")